So we are here to just make the point. Well make the point, yeah. Make the status about the shot detection. To so you have applied the shot detector to some of the CINETIS video, or at least one, and uh so we can basically maybe just comment on on this, maybe quickly remind what the shot detector is how the shot detector is working, basically. Uh we might be able anyway to give you some reference. I mean your report it's it should be described there or yeah, yeah. You have the report? Okay. To Olivier, yeah. Ah, okay, yes, the kayak wi I was not even with the other one, okay. By the way, I was uh going to ask, so I had seen the video, i you uh did um a downsampling of the video, right? I mean For uh browsing I mean. Okay, yeah. So this is hierarchical normally, so you have scenes and shots, but Because for instance we could probably, you know, like group all the initial ones into one scene and maybe after, when there's some things inside, it s should be another scene, so. Which are extracted I think uh Downsampling. Yeah. Yeah, the quality is that's why I was asking that. It it's not this that you processed, right? Okay, okay. So here we see that we the shot at eighteenth and after twenty twos and after again. And uh if we go down, I mean maybe in terms of shot here, we can see that Right, I mean so now we see shot seven I suppose, yeah. Uh yeah, okay, shot nine currently, shot ten probably. Eleven or twelve. No, eleven. Or no this is ten indeed, okay. Yeah. Yeah, this one, it's eleven, twelve or fourteen seconds. Yeah, okay. Okay. Yeah, it's inside and yeah. It's really a dissolve or is it really like is you have the feeling that it's both Okay. But it's it's it's not standard man-made movies, because I would say most of the time people couldn didn't do any dissolve. Yeah, I could see here, I mean, that it was like horse it was only one single one. N yeah, yeah. This is the one I I had seen, yeah. But as I said, I mean uh with even the motion detector, when it's black it's there is not s no structure, so you don't esti yeah, but you don't really estimate motion. Well, there is no structure. There is no texture, if you want to say so, and this is this is not so good for the motion. Uh but for instance of course between the horse and uh and um the c the c uh uh yeah. Uh the cows, yeah, maybe we could have detected things. But it's not even sure, because the background is quite similar and uh it's it's it's not that sure. So it's yeah. The histogram, so we have experience with it. So it's it's scrambled, so you can scramble the image and you get the same histogram. So there is no structure indeed, so of course when you look the horse and then you can see as a cow the cows, of course you have a black in the top and uh you have also uh maybe eighty percent of like green uh, which is even not green there but uh uh grass. You have stones and uh so there of course are differences, because on the one hand you have horse, but it's black, so you have more sky, which is black on the right, so you don't care about this. So it's a glo it's global it's a global histogram, so yeah. But it's when for this I would for this application I could imagine that it would were quite high. Uh and it's a threshold but still it's adaptive, right, if I remember well, yeah. So. Because of course if you just compute the yeah, if you have the report because if you compute the um distance between the two histograms, I think this was Bhattacharya? Right. So now we know. And uh so if you take uh these two with the with the Bhattacharya uh then of course you should just do a threshold. When there is motion you will see a lot of things changing, so by using an L_R_T_T_ threshold, if you see that there is a continuous change of the d distance, you don't do a threshold. It's only if you have an identified peak, which is m fifty percent or maybe uh two times higher than the the next second value in the window, then you say it's it's a shot. Right, I don't know. Yeah, yeah. Yeah. It's continuous for dissolve, yeah. But an anyway it's difficult, I mean in th in in in general, because sometimes you have some dissolve you could imagine they are part of of of a scene and uh so, but of course when you have cuts you can see they are easy to detect, right, I mean like something yeah. So uh yeah, so with the motion basically the motion tried to estimate well the motion between two frames, right, if my image is there but after it's there, and it looks at how many points c are well matched. So you can see that like somebody speaking and so and so before the sh before the shot you can see that usually there are like ninety percent of the points at least, right, who have some motion, so uh But then after suddenly if you go from one frame and you try to match all the points here with like a translation, plus the zoom plus maybe some rotation, you c you can find anything well, right, so basically you would see that the number of points maybe only ten percent of the points you can match well, so it's easy to detect. And uh after you can see that indeed because it's a long shot and it's almost like a flat plane, you can do a very good registration through the number of points. Is very high, I mean the number of points that are matching. Uh and here you basically here with a histogram uh you can see it very flat, but of course you should go from this histogram to this which is essentially green. You have this big uh measure, which is here uh the chi two I don't know whether this is what you implement is used in the no it's It's Bhattacharya. Okay. Yeah, so yeah, and you have maybe a one image with a um adaptive threshold, I think. You had something like this, right? Just to explain uh how Yeah, uh Yeah, you see like here. Yeah. Alpha times the maximum of the other ones. Of the other ones. So the of course if it's you are not a maximum, you are not going i inside the window you are not the max, anyway you are not going to be to be above uh alpha times the max of the others, right, uh because alpha is low lower than one. Uh. Like for instance yeah. Like for instance here I I see six, but I don't know whether this was for Bhattacharya or for uh chi s chi two. Yeah. Yeah, yeah, yeah. Two point four, yeah, so it's a bit less because the amplitude is much smaller, yeah. uh yeah. So you can uh it was not so just I mean because the motion I mean it was uh in the shot we've seen, f because f especially with chi two square it r the amplitude it's much bigger than with Bhattacharya, right. Uh you can see that it could go to seven hundred, and when it's there is nothing, it's can be zero. With Bhattacharya it's between zero and one, so you have not so much amplitude. Um but uh still uh in some of the v these videos for instance even if we gon go now if we want to, but now for the kayak case, I don't know. Uh at some point targeting two kayaks and it's moving quite fast towards to other people. I don't know what happened there, if it was detected as a shot transition or Yeah, yeah. It was it as a shot, okay. I was uh wondering yeah yeah, maybe. But here for instance you understand so we can show this tomorrow to uh to the people at the ci CINETIS meeting, I mean to Right. Yeah, okay, yeah, we need a connection, that's true. Yeah, that's true. Yeah, I can bring my laptop if necessary, yeah, yeah. I'll I'll bring it, okay, so it's it's done. Well yeah, but normally there is a a burning if you c maybe you can play the shot or It's strange I mean, maybe you play yeah, if you play the the video well, insi at the beginning, I don't know what's happening. Ah, you didn't generate the okay, the Real Media. Okay. Well no, I mean here I can see that it was it was detected as a single shot. Okay. I was because here you can see that the motion uh here is uh, you know, he is uh doing a zoom on the two kayaks here. Yeah, and then he's doing quite a fast transition from these to another one, and I was wondering whether so if you well though of course it's true that it's mainly water, so maybe the colour is indeed uh quite but you know, as a kayak is entering quite fast, I mean you can have quite maybe good distance. But I see that it seems to be how many shots do you have indeed de de detected? Six, yeah, the one is uh maybe so you did it with with what by the way? It's with the images I gave you or okay. Ah, to regenerate an A_V_I_ file, okay, yeah, yeah. Okay. Okay, yeah. Yeah. Yeah, I mean initially I don't know why there is a a shot detected. Ah okay, so the okay so the here is an image number. In your programme. Ah okay, okay, yeah. Okay, okay so then this So, but it's in seconds or it's Okay. Okay, yeah, okay, okay, seconds plus number. Okay, so basically what has been uh yeah, extracted is yeah, this point rather than okay, so this is normal, okay. Okay. Okay, like transition, something that is okay. Yeah, it's motion yeah, it's not a cut. Yeah, but I was thinking that maybe it could have been. But it's true that it's mainly water when we see things, so it's, yeah, it's not such a big uh but yeah, sometimes I mean you can have various things hap I mean just think about more action movie and thing like this. Well sometimes it's very difficult, so focus, usually there is no problem, people say this is a cut, this is not a cut, but like for dissolve even sometimes, you know, people may it's people may have some trouble I mean this is a dissolve this is not this should this should be labelled as a transition or not. Some people will say yeah, but it's part of the thing, or uh for instance in like in um also in uh, come on, uh advertising usually. Because the shots are very short in advertising, usually it's uh very fast and so sometimes, yeah, when you have an incrustation like last time, what what should you say, it's it's, yeah, it's a change of But uh Yeah, I mean it's a so sometimes very difficult. A at least what I can see that currently like for CINETIS uh for instance if it's to do colour correction or for instance if it's too black, too a bit whiter, I think this type of things should be fine. Because basically it's even if they are not real shots, they are not so interested if it's not real shot. Maybe after they have this and they may say okay, just by looking at least the summary, so maybe for the summary we could do better uh key frame extraction for instance, but by just looking at the key frames and so on they may say okay, from this shot to this shot we apply like uh specific um colour enhancement algorithm or restoration algorithm or uh to improve uh the brightness and something like this right. And uh so I think for them this is fine um after i yeah. Well, maybe after I mean they would sh okay, uh they would go directly and say select the shots from here to here, and they would say you apply this algorithm, yeah. So of course we could try to make scene clustering and so on, but I'm not even sure that this is necessary for them at this point, because uh if they have a six minute movie, if they have thirty shots, for them it might be easier to look at from here to here I apply this, and they can do it quite quickly. And they have to do it an to do it anyway, because what's going to take time might be uh to select the type of restoration algorithm or the type of things. So this might take more time than having to click two times on two shots rather tha tha than uh having directly a scene, right? So so I think for them uh this type of things is going to be useful uh, at least for this task. If they want to do stabilisation, it may not be sure of that. Uh this is the right segmentation for some of these, but so but this is another point which for motion stabilisation, as I said several time, we are not we did not commit to provide anything, but of course if we can provide things, we'll do it, but uh yeah. So do you have uh more question or uh do you have a Well I just i i i it's I think this is Yeah, but after I mean it may be more like a guy um i int i in an interface, a guy interface. I mean where I mean if second select from shot one to seven, uh this is a same seg they may do the grouping by themselves quite by looking at this it may take w w less one one minutes. And I think it's anyway, if it's too summarised after they may want to go inside uh to see whether there is not special things I mean, so. Uh because after it depends, I mean if you give only one key frame per shot, after I mean it's one hundred seventy images, it's quite fast, right, and so I think and the hierarchical thing is good, because still you can check if you have some ambiguities you are not sure, you could Yeah. Yeah. Yeah, at least yeah, yeah, yeah. Yeah, yeah, yeah. Well, ultim ultimately I think this is the type of things, but at this point we uh yeah. I I hope uh that uh if it's a big success uh they will ask IDIAP to to provide something like this of course with some money or at least to pay a student or somebody for doing it, but uh yeah, of course this was ultimately I think one way of uh if you could provide like chapters or something like this, indeed you could provide like the A_V_I_ to the clients, and he may ask for special services or uh and even to just to watch the movie, to have direct access to different parts and so on. But it's uh I mean uh this labelling might be more easy to do for the client by the client, because he knows what they're I mean he we wants to have this and this and uh right, so. Yeah, yeah, he may want to organi yeah, yeah. Otherwise, I mean this but of but of course it might be easy to do a very simple software, which would allow the user to select just the shots and give names, and after it would generate automatically some index and uh as a I suppose the client might be interested in such a such a things if he could uh generate automatically on the D_V_D_. But for and you could imagine that before sending in the D_V_D_, you could allow him to l to watch this type of interface, and he would be able to do the selection, say this is this, this is this, this is this, this is this, and when he would receive the D_V_D_, he would indeed uh get already the chapters that he decided to put there. Yeah, yeah, yeah. And I think this may it it's good that this is a demo like this, because I'm sure that they might be interested indeed, because I imagine that some people Yeah. Yeah, yeah, yeah. Uh yeah, so I think this is uh this is a very good uh and a yeah. But uh So this is a kind of service, I don't know, either they give it for free and uh But uh But you see that sometimes even here it's a bit like uh it's for instance if people would like to have like some motion compensation, uh you i it could be proposed. But for instance here it I wou it would be problematic to have motion. So i here it's fine to do motion compensation, but stabilisation rather than yeah. But For instance here it's it's it's it's Effects? Okay. Ah okay, so maybe like here for instance it is a just reproduced two times uh. Yeah. So this is vi okay. This video is about four minutes, five minutes, right? Ah okay, so this is not the one that has been uh Okay. Okay. It's it's it's real time, right? I mean the processing. Yeah. A bit more than real time? Well anyway, yeah. Yeah, normally it could it could go faster. But i probably it's probable that even for the project you would do some of the reprogramming, I mean to especially if uh we need to interface new uh um yeah, or for just reading, I mean the the D_V_X_ and thing like this, so it's uh uh so tomor anyway for this we don't know exactly yet, because it's not sure what they want to what the user's tools for reading D_V_D_ and so on, so. And s because I don't think this is so difficult, now you have um already looked at the histograms and open C_V_ computing distances and so on. So if we want to do a special thing that probably runs fast, it may be easier to do it directly and uh yeah. But uh but yeah, the interface is very so the interface uses the output of your programme. Essentially, okay. Okay. Mm. You can do the get frame. But otherwise it's not necessary. Oh yeah, no no no. Ye okay. Oh no no, you need you need to generate these images, right? Okay. Okay. Okay, yeah. Yeah. As a directory, yeah, yeah. Yeah. Yeah, yeah. Yeah. X_M_L_, yeah. On the shot. And it's a simple format I suppose, like uh like you have scenes and shots and uh key s some time stamps. Yeah. So but you're yeah. Yeah, I mean it it might be good to And who has a right to put who has a right to put data on the M_M_M_? Okay. Okay, yeah. Yes. Ah, okay, yeah. Yeah, if yo If you have the same uh decomposition right? Yeah. Yeah, but it may happen that uh for ninety fra ninety percent of the frame it's different, so you see it fine, but suddenly you have one other frame, they don't know what it is and where it comes from. Yeah, yeah. Yeah, yeah. Oh, you mean bac because it's it's it's the M_M_M_ cache? Okay. So this means that when it's done, if I go on my computer and I do the same, it's not going to enter in my cache, but it's going to be already there, in M_M_M_, okay. So it's th it's so it's not done on it's so it's not done online. The first time, yeah, but after it's it take okay. Okay, okay. I thought it was everything was generated on the fly th so Okay. Okay, yeah. Okay, so Yeah. Yeah, yeah, yeah. So could you say uh video name CINETIS slash something? Or not? You th okay. You think it would work, okay. Hmm. Mm even if you would r change a C_G_I_ for instance. Let's assume that we say we don't do any scene thing, then we could remove like for instance the scenes, right? I was thinking that here it says that you have online video structure correction. Uh which means that Okay. Yeah. Uh. This is wish wishable uh functions. Okay, to do this. Okay. Okay. Okay. Okay. No, I was thinking No, because I was thinking if it's there, I mean we could use it uh uh we could uh you see but uh yeah. No, no. Fro from outside we can view them. Yeah, from home I was able to access them. No. Okay. Nobody actually it has been downloaded several times and people there are some lawyers that are putting no. But yeah. Y Yeah yeah, yeah yeah, no no, yeah. No. But yeah, it's good to Yeah, if we i yeah. It might be good indeed if we start d building some experiments like this still to have a password. Right, yeah, yeah. Yeah yeah, yeah yeah. Hmm. Yeah, yeah what's, yeah, yeah. Hmm. Yeah, it was done indeed f to be more general than just to do the shot detection, so that's why the code is a bit more complex, I think, currently. Uh but uh Yeah. Yeah, yeah. Just Yeah. Ah okay, yeah yeah. Yeah, it was the same the same ones as the one you saw uh in my code, yeah. Yeah, yeah. Yeah. Yeah, yeah, yeah, yeah, yeah, definitely, yeah. Yeah, uh okay. Yeah, I think it's it's good, yeah. Okay, that's good, yeah. Yeah. Yeah, but why is after, I mean if there is a shot you don't know between because you dropped one frame, you don't know where is the sh uh the d the shot. Sliding window. Yeah, right. Yeah. I don't know, I'm I'm sure that they will be well I hope they will be a of something like this, yeah. The output. Yeah. This generates yeah. Ho how do you generate the Real Media, yeah. Okay. Oh o o, oh yeah, you mean the Java framework, uh what uh Mike presented. We cannot play uh Real Media, okay. Okay. Formats, formats, yeah. Okay, so I think this was good to see the the tool and uh tomorrow we are going to re-say most of the things, but shorter I f I believe, right, I mean it's bad to explain. And I think maybe later w yeah, we we can start doing simple Yeah, yeah, yeah. Yeah, the full uh Yeah, mi this might be interesting to see tomorrow. Of course maybe there was the ah, one hundred seventy shots, or it's a lot, but uh after I mean at this point we do we have nothing we can do unless they know what they want to do with it, right? Yeah, yeah. Yeah, but I think I think is it's too it's going very be very to be very good for them to visualise how you can organise things and maybe what they can do, how they and so this is important. Mm-hmm, yeah. Yeah, initially yeah. Well, we didn't because you tried to do something Okay. Yeah, like for instance if we need to find sub-shots, usually it it for instance is better to have the whole shot to decide on how to build the sub-shots. But it's true that maybe I if you store all the distance at that the distances we could do some basic stuff. Okay. Yeah yeah, yeah. Yeah yeah, yeah. Yeah, yeah. That's the thing, yeah. Ah, th the gap you mean or no? Oh no, you mean in the Bhattacharya distance. Yeah. Yeah, yeah. Yeah. Yeah. Yeah. Okay. So. It's okay.
Oh yeah. You don't put your headset microphone? Okay. Yeah. Yeah. Yeah, I have actually yeah, the we can see the report here and the demo, and the goal of uh this meeting is to present the video shot detector to Olivier, okay. Okay. So yeah, it was my project um during my internship at IDIAP under the supervision of Jean-Marc. And as you saw as you saw in the email I sent uh okay. I didn't send this video is the new one. I just um uh processed from your images. Yeah. But uh let's let's start by the video from Friday. CINETIS demo one. Yeah. So this was uh Okay. Yeah, here on the left you have different things, but uh as applied So this detector uh detected the thirty shots, and if you click on uh plus button you can expand it and see the key frames that are in fact uh five extracted frames from the shot. But I think th uh as it is, it's t extracted uh without uh using sub-shots, uh it's simply uh five down-sam yeah, down-sampling. Um so I we can see the video actually, uh the entire video, can open with Real Player. I hope it will work. So we don't have the the sound, and it won't be grabbed the video won't uh I don't know if uh Okay, it's not good quality. Oh yeah, yeah, yeah. No no no no no no. It's in Real Media, so it's very compressed. Uh okay. Uh ten yeah. And then there's um Okay. Yes. And this one, yeah, it's a dissolve it's a dissolve, so it doesn't detect yeah, when I saw the M_ with M_ Player, it's a dissolve, yeah. Yeah. Okay. Yeah. I don't know if this one if we see was correctly uh, you see? Yeah. Yeah. But there is a moon. But okay, because there is no motion in fact between the two shots. Yeah, okay, okay. The cow? Yeah. Mm-hmm. Yeah, maybe to explain this method was using uh a simple uh uh distance between uh consecutive histograms, and we didn't use motion features at all for that, that's why i we Yeah, okay. Enough Yeah. Yeah. With a sliding window. I can show you uh Yeah. Yeah. I don't know if we c we want to make uh use the whiteboard or here it's uh it was uh to dis it's infra only the this part about uh shot boundary detection is in French unfortunately, but uh the this image shows the results using motion when you s use the motion features, and here on the right is when using the histogram uh distance. And the the motion features can better show what's going on when there is such a difficult uh transition, while with the histogram um, yeah, it just shows that there is a shot uh change at the end, but even for dissolves, I don't have examples here, but um there is not a strict uh transition. Uh it's continuous for dissolves. Dissolves is uh yeah. Yeah. No no, now it's uh well for the video you saw it wa it's Bhattacharya. Yeah. Okay, set the beginning. So there is the idea is having a sliding window. Um we have already all the data that is on the graph. And then to detect uh the the the the changes, we we do a calculation um with points under this sliding window within this sliding window. And uh we calculate so what do we here? Um In fa we want to detect that uh this points in the middle is higher than uh Th Exactly, yeah. Alpha. Alpha times the maximum of the other ones. The c But if you are in the maximum already and at this point we check simply if uh this value is alpha times the the max of the other points, let's say these points here. Um i maybe for chi it's it six but uh, yeah, but uh for Bhattacharya it's for the tests I did it two point four. Yeah. Yeah. Okay. No Mm. Uh i in the the first shot? Yeah, I saw that it was detected as a shot, yeah. You want to see now? Okay. But he didn't okay. It is uh okay. This is IDIAP laptop, so you can u okay. So here in fact after twelve um I don't know if it's I think it's twelve frames. Yeah. Oh yeah, maybe you're burning, but Oh, there is no ha ha. I didn't uh ha I didn't have time to the the Real Media. But um yeah, it's complicated. Yeah, here we can see that, yeah. At two point four. Mm. And then there is translation on the right, yeah. Mm mm mm. I can see uh there is another programme that can see Uh the N_ uh It's I think it's f there there are five normally. If we put expand all Yeah. The other one I had problems converting uh uh the video into D_V_X_, and then uh so I used the programme from uh software from Sebastien, P_ P_A_P_M_ to A_V_I_. Because yeah, one thing uh maybe we will talk about that is the library uh, it's uh it doesn't uh compile anymore on Debian machines and, yeah, some maybe some work has to be done on this. I don't know why this uh image doesn't appear, this is the um um the frame that is uh uh extracted to be the shot key frame here. This one is this one, the middle one. So I don't know what why it doesn't appear, but it just uh a simple problem in the get frame. But what I can show you now is a little programme if we want to be um very precise about so it's ma get frames Okay, C_G_I_ here. And we were interested in around what di do you want? Around six, seven seconds? Okay. Uh if we do next. Oh. Yeah. There is something wrong. So it detects Yeah, in fact, yeah, that's why I said it's it's a frame number. Uh w I don't know why, yeah yeah yeah. It's a frame number. This is a frame number, this is in second, but I I kept the frame number. It's twelve frames, zero seconds. Yeah. Yeah. Yeah. So if we do every second uh I d the middle frame is uh this one thousand Okay. Do you thi it's another kind of browsing page, but uh to be more precise about uh to to check maybe the this yeah, exactly, yeah, yeah. Anyway, you go back to one frame around f one sixty for instance, I don't know. And and there is yeah, that's it, you know. Prev Okay. You see that there is uh okay, it's not a shot change, it's a motion actually here. Yeah, yeah, yeah, yeah, yeah. Yeah, I s yeah. Yeah. When you have yeah, the the goal results uh then yeah, in fact it's not a shot change. Yeah. Yeah. So this would ma be made uh manually actually. When you say from this shot to this shot is using a graphical user interface someone who would do manually for each video. Method, yeah. Yeah, mm. Yeah. Ah, okay. Yeah, yeah, yeah. Mm. Okay. This is the one in the D_V_D_? Maybe okay, we can uh see a little bit if you want, to play the Well no, it's okay. Okay, uh in fact I didn't understand that, but now I understand better that CINETIS wants this kind of method to help them to restore and to apply the algori some algorithms to to to make a better movie, okay. It's not for the client, for the user who will uh maybe okay. Okay. Also but okay. Okay. Mm-hmm. Mm-hmm. Mm-hmm. Yeah, uh okay. Okay. Ah, okay, yeah yeah yeah. Because he did the movie, so he wants t okay, to archive, he wants to okay. This is yeah, another mm. Yeah. Yeah. Yeah. Okay. Mm-hmm. And this could be done by through internet. Mm. Mm-hmm. From the client point of view and from the preparation of the movie D_V_D_, mm. Okay, I will play, yeah. We had an arrow here or what? It's Th If I go to menu there is something? Do you know what this is? No, old menu uh there is no menu in the D_V_D_? Okay. Okay. Okay. So this is the same video a the the A_V_I_ number one? And on the on the D_V_D_ I saw that there are some other stuff. I dunno. This one? The same. Okay. Oops. Ah, okay. Uh. There are two files. Okay, I understand. Okay. Run the okay. Um Yeah, almost. I would maybe a less a little bit less than uh a bit more than real time maybe. I will have to check, but if it's five minutes videos, it's like done in yeah, quickly. Okay. Components open C_V_. Yeah. Yeah. Okay. Open C_ yeah. Yeah. So that we'll use the whiteboard. So in fact, yeah, the interface is quite is quite simple to use. Um for instance you see um on the uh on the U_R_L_ I put okay, I I disco there is M_M_M_ dot IDIAP dot C_H_ slash V_C_G_I_ bin slash video browser. Okay, uh slash video browser three frames, three frames dot H_T_M_L_. And then this C_G_I_ programme ask the mm parameter C_ video and M_ equals CINETIS okay, demo one. In fact um I just explain how to put the data on M_M_M_, but it's quite simple. When you log into M_M_M_ as with a C_ C_ S_S_H_ for instance, um so you just put it's working like this, M_M_M_ data video browser. Uh okay, slash. And then This is the name of the new directory you will create. Demo one. And there stands there is couple of things well it's uh this name so I will say this is video name, so video name dot A_V_I_, you just because we need the A_V_I_ so that we can uh Yeah, we can do the the get frame Otherwise it's not necessary. Oh, uh f and also for the Yeah, and this is no no no, uh this is done on the fly. With get frame. So you need that, you need uh the X_M_L_ v so it's video name And I could data dot X_M_L_, but in fact it's just a simple convention, I w could have put nothing, but I just put data. The it's a convention. Data, and then video name. Okay, it's R_M_ nine dot R_M_ for other stuff. But in fact, yeah, when I think now, we could have uh Only with uh extension you can you can see. The name is no of in fact yeah, is is yeah. Yeah, is the same as the directory. It could have been video naming also, but it's the same as a directory. And that's it actually. When you you u you do that I did the same for kayak you say kayak in fact now it's kayak, and then you can see. And this output yeah, as your this output uh this file is an output from the mm the software, the shot boundary detection software. It's a very simple format, yeah. Yeah, uh actually we Uh yeah, I don't yeah, we cannot see. It's, yeah, it's uh tha video, scene, shot, it's hierarchical in X_M_L_ and key frame time stamp and the and from the time stamp and the video location he knows how to get the frame. Each of these frames. And if we do in fact I don't know yeah, it's Mozilla Firefox view page source okay, view frame source. And every frame is stored into uh a b a backup in fact every frame um it's M_M_M_ who manage this automatically, so that um when uh you ask for this frame, in fact we can okay, I just view frame source When I view image, when I say view image, when I say view image on this frame, up we go there and this is um in fact it's a Perl programme that outputs in fa instead of outputting H_T_M_L_ or text, it outputs uh J_ PEG. From the video with a time stamp it outputs a J_ PEG. This is quite uh nice. And Yeah. And this image is i this image is output uh outputted to the client. Yeah, exactly, yeah, yeah. That's what I did uh here. In the other application, yeah. Yeah, I dunno. Yeah, here. I f there are, yeah, the M_M_M_ developers, so I'm part of that, but uh we could we could uh yeah, it's not a problem. Um I wanted to saw to see to do s say something else. Um oh yeah, and the images are sent to your browser, the client, to this browser, but also kept in uh a directory in uh on M_M_M_. Exactly, a cache. I say backup, but it's a cache. Um that's that's how it works. No no no, it's okay. Um for the cache uh We have plenty of space on M_M_M_, so the this is not a problem ah, yeah for a new window. Yeah, exactly, yeah. And it detects okay. Yeah. So you go inside the get frame function, you see where it i it saves it, and then there is a special naming convention, and you can remove the but i it's c yeah, it's true. No. Because there it's in cache, so that if I put another video with the same name, they th they will happen again. They will uh appear, this frames will appear. Yeah, yeah ex if you have the same decomposition, but But uh yeah, on M_M_M_ f now we don't have such many videos, so. But if you put yeah, in another directory, uh then it would work or or a new naming convention with uh special uh, yeah, code uh. Yeah, M_M_M_ cache. Yeah. But it it can go in the browser cache also. The first time is done online. Yeah, you can see here, yeah. Uh the first time is done o You know, you see that. Well here it's it was cache, because I already pressed preview, but now it's on the fly. So it's quite quick. This. And then if you come yeah. Because the images are already generated. And maybe they are on the cache of this browser, this computer. So okay. That's why it's yeah, it was uh used with uh yeah, Sebastien uh did the C_ function, and uh, yeah, and uh the the is calling this uh C_ uh executable and it was used with M_M_M_ server with Pierre, a and the beginning when, yeah. Um yeah, yeah. Oh, the movies or In fact uh we we we could, you know okay, you know all these graphical user interface that you see uh okay. In fact okay. Mm yes. Yes, we could do that. We could put uh CINETIS slash here, and then a CINETIS slash is the sa yeah, it works. But the uh th the point is this interface is uh I think six or seven files only, X_M_L_ uh well it's per C_G_I_, that output, X_M_L_ and X_S_L_. So it's it's not a big deal to to to make a new, like a CINETIS browser or, you know, video browser. Yeah, we w we copy all the files and there is uh just one um one parameter, one uh string, that we need to know where this data is, and we could pr create a M_M_M_ data video browser uh CINETIS or something else. Y it's uh ex extensible and uh is Yeah. Yeah, exactly, yeah. Yeah. Yeah, but i it's not uh it was uh, you know, thi this is working, the expand all but uh this is uh Yeah, to do list or something. On a video annotation, on a video structure co correction. But i it's not easy on the web with H_ H_T_M_L_ pages to to do uh such corrections like uh video structure corrections for instance. For that you need it's possible maybe uh on interfaces like Flash or Java, uh but still uh i i i there is uh it takes bas more work and but i yeah yeah, I mean it's good good point. Well uh you we can make for for instance for T_S_R_ T_S_R_ uh videos. Alessandro uh asked me to do the same thing, and we I put uh we had uh like a password convention. So this is possible to make for these videos. So it's not uh Well if yeah, it is there is no possible productions for these videos. Maybe what I say I'm very tired. But uh if po yeah, it's demo and uh one can say you have to guess the name to but uh yeah. Bu y yeah, a good point uh about password protection issues uh or the privacy issues uh Okay. Uh ha yeah, all the framework is ready with a T_R_S_ T_S_R_ videos, so. Usually what we do sometimes uh okay, just to say something about password protections. Sometimes Instead of having um a purely secure directory we p we we put a co code number like uh so that it's quite impossible to guess instead of video m uh name equals kayak then we would have uh zero six seven uh five or, you know, two. And then this directory from outside we have to take care that okay, not only the video name this U_R_L_ is not accessible, but also we should take care that in fact M_M_M_ it's in M_M_M_ data video browser and this is accessible from outside. We should take care that this directory or in fact this directory is not listable, you know, you cannot list from outside uh with uh F_T_P_ or H_T_T_P_. Yeah, simple things with privacy issues, but uh i there are some methods and everything is possible to make things private. I yeah, I don't know if you want to add anything, but I would say that for the C_ code uh, yeah, I I I just want would like to stress something, is that um, yeah, if you want want to use the new versions, etcetera, I think uh we need to go a little bit through the code and I would be happy to to to help to to for instance the classes about video structure, etcetera, to put them yeah. And Yeah, in fact it's not there is no tosh i it was called vision lib to read uh to to extract one frame each frame of a video, you know. So it wa from Sebastien it was in the tosh library, but uh it's nothing w to do with tosh and it's using uh uh the open C_V_ and uh some classes from uh Jean-Marc uh for, yeah, Bhattacharya distance, etcetera. Image processing. Image processing and stuff. We can forget tosh yeah. Yeah. Um so it's four thirty, so we need one hour, it's it's good. Just if you want yeah. This is the parameters of the of the C_ programme but uh usage. Video in, okay. Uh the output file X_M_L_. Output file dist, it's the distance um between each histogram, so that if the X_M_L_ uh you feel there is not enough um uh shots, you can instead of reprocessing the video, you you reprocess this file. And also some other stuff s for instance alpha threshold we talked about is set to two point four and uh yeah, so the uh the the step we always use one and it should be one, because otherwise it's stuck between consecutive frames. Yeah, but uh it doesn't w yeah, it it's better to to keep it one. Yeah. So detection latency is about the um it's half of the windows tha yeah, sliding window,. Option for key frame extraction, okay. Uh sub-threshold, yeah. Leave it to one, yeah. Uh yeah. Maybe, I I will uh I will give you this file in uh as a rhy it's a kind of rhythmia, yeah. I just for preparing this uh nice discussion. M um I'm quite happy to to to do that i it's good when you do uh something that there is someone using it. But uh but At least, yeah, with the interface it's quite funny to to see um d that the q quickly you can do something in in, yeah, in thirty t if the video i ten is ten minutes, in thirty minutes 'cause it's no not a lot of things to change. Or even it can do it uh it can we can do it quite automatically. It can be done, yeah. Yeah. Yeah. Yeah. Oh yeah, it So the Real Media is uh, yeah, a parallel thing that from the A_V_I_ you convert it into Real Media, and it is done on Windows. Uh Yeah. Mm there is a a the free one is okay. You can use the free we have a license here at IDIAP, but uh the if you want to pay, you pay for options such as uh resize or cropping, but in fact maybe we don't need that, and uh it's called Elix producer. Yeah, I forgot yeah, yeah uh, Elix producer and and that's it. You can you can i it it i Yes. Yeah. Uh yeah. And yeah, with Real Media Well this is you can buy or you can order for free on internet. No, no no no, for the Real Media. And um you didn't see, but the Re the Real Media um is called via a real server, so it's R_T_S_P_ the connection, and i it's streaming video streaming, so it's not H_T_T_P_. So that it can play uh smoothly and but we are still uh investigating for the video formats, because um i it's not supported for instance uh uh in Java, in Java applications uh etcetera. Real Media. Yeah. Real Media, so there are many issues about that, but yeah. It's another problem. Yeah. Yeah, that was a single video structuring, but it's uh other level uh it can uh it's quite uh it's researched, yeah. Yeah. Yeah, last thing uh this is a kind of uh X_M_L_ that is outputted, so so it's uh the structure is such as uh vi video segmentation with uh videos in begin and then for each shot you have an I_D_ well this is the terms from M_ PEG seven convention, but anyway. Shot begin and end. Uh a sh the shot key frame that is the middle here. Uh sub-shot Yeah, you're right. Twelve frames. And in fact that's what I output in uh on the yeah, I think y on the interface. And there is only one sub-shot because uh w yeah. There was there was problems with detecting the sub-shots, etcetera. But uh yeah well. Yeah, I tr well uh not with this one, but yeah, I tried to code some stuff uh it but uh I'm not so confident in the way uh it is uh coded. Yeah. Yeah. But we yeah, even we should uh for instance for the videos uh for the spectral clustering, for the clustering. Here uh I store the consecutive the distance between consecutive histograms, but we could uh store the matrix that's yeah. And then uh yeah. And I think i there are many possible things with uh spectral clustering, but but the uh with um there is one parameter that is very c wha I can't remember, what is the parameter uh Yeah. No no no no, the in the the gap to find the the good number of clusters is so sensitive uh Yeah, anyway. It's it's a bit later in the in the report. So anytimes, yeah, I would be happy to to to help you with that. Yeah, yeah, you can we can go.
Hmm. Yes. Mm 'kay. Okay, this is. Okay. Mm-hmm. Okay. Okay. Okay. 'Kay. Okay. Mm-hmm. Okay. Okay. Hmm. Mm. Okay. Mm. Yeah. Yeah. Okay. Yeah. Okay. Mm-hmm. Okay. 'Kay. Hmm. Yeah. Okay. Hmm. Okay. Okay. Okay. Okay. Mm-hmm. Okay. Yeah, okay. Yeah, I understand. Okay. Yeah. Yeah, okay. Okay. Okay. Okay. Hmm. Mm-hmm. Mm-hmm. Hmm. Okay. Yeah. Okay. We'll see. Okay. see the cut. Okay. Mm-hmm. Okay. Okay. Okay. Okay. Okay. Okay. Hmm. Yeah, okay. Okay. Yeah, okay. Okay. I will look to have a connection. Okay. Yeah. And we need a laptop or you have yours? Okay. Okay. Yeah. Uh okay. Yeah. Mm 'kay. Hmm. Hmm. Okay. Hmm. Yeah. Hmm. Hmm. Yeah, yeah. Oh yeah. There was some burning Okay. And the next is eleven seconds and twelve frames. Hmm. Okay. Hmm. Okay. Ah, okay. Understand. Mm 'kay. Mm. Yeah. Mm-hmm. Okay, mm-hmm. Yeah. Hmm. Yeah. Yeah. Hmm. Hmm. Yeah. Hmm. Hmm. Hmm. Hmm. Hmm. Okay. Hmm. Hmm. Hmm. Hmm. No. Uh I have look at the demo film, and it's about uh forty minutes and about uh hundred and seventy three different shots, so it's yeah. Yes. And uh I don't know if this is a real sample D_V_D_, because uh this will make about uh five seconds for each shot, so I think it's very short. No? I if this is to r to to to make a change in the colour, light and uh they have a lot of uh shots. Hmm. Yeah, okay. Ah, okay. Yeah, okay. Okay, understand. Yeah. Okay. Hmm. Okay. Yeah, yes. Yeah. Hmm. Oh. Maybe they can Hmm. Hmm. Mm-hmm. Mm. Mm. Mm. Mm. Mm. Mm. Yeah. Yeah. Mm-hmm. Mm. Yes. Sorry? Yeah. Mm-hmm. Mm. Yeah, uh I think I think they made some uh special effect in this D_V_D_. Uh there is two or three time uh you see you see the same uh same kind of uh scene, uh just uh two or three seconds Yeah. Hmm. No, there's no menu. There is just one uh one chapter and no menu. Yeah, yes. Yes. Uh Uh I think in the video there's two there's two video, one is a real video and the other is a just uh introduction when you have the menu. No, it's uh thirteen minutes. No, but yes, this is the same, but uh there was a problem when do the grab from the D_V_D_ to the files. So this is why they have make two two files. But now I have one file also in the so you can process again one time. Mm. Yeah. Yeah. Yes. Hmm. Yeah. Yeah. Yes. Hmm. Hmm. Hmm. Yeah. Mm. Okay. Okay. Okay. Okay. Yeah. Yeah, okay. But the the name of the video file is video name is uh it's the same as a directory. Okay. Okay. Okay. Okay. Okay. Yeah, okay. Okay. Okay, yes, okay. Yeah, okay. Yeah, okay. Okay. Mm. Okay, okay. So every every frame on the f page is uh online. Is is a Okay. Yeah. Yes. Okay. Okay. Okay. Okay, understand. Yes, okay. Okay. But on the on the on the server the original is a f video film. Okay. So for example you can change you can change a number to take which one, okay. Okay, understand. In the other yeah yeah. Okay, perfect. In this one, okay. Yeah, I think it's not well done, yeah. Yeah. Okay. But um. Yeah. Okay, okay, there is li local cache. Okay, okay. Okay, okay. Okay. So if we for example now we have the c the the case that this video is not correct, because uh I have make uh event grabbing. So uh we need to remove the cache for this video or this is automatic? Yeah. Yeah, but what I mean is i i i i i if I if I put the same video with the same name instead of this one, they should uh different Ah, okay. We need to remove them. Yeah, okay. Okay, understand. Okay. Yeah, it's not a real uh usi usage. Okay. Okay. Yes, understand. Okay. Yeah. Yes, bec because this one i was uh Th this one was Yeah. Right, okay. Okay, but it's okay. It's okay. Okay. Yeah, yeah. Hmm. Hmm. Mm. On the M_M_M_, yeah. Mm. Hmm. Hmm. Yeah. Yeah, okay. Oh, it's Greek, but uh okay. Okay, so i if you come back now, you Yeah, yeah. Okay, but now you have s you have both cache. Yeah, okay. Yes, computer too, yes to uh but okay. But it's fast. Okay. Yeah, understand. Okay. Okay. Uh i ju just uh a question on the the organisation of the M_M_M_. So you don't have uh a part for specific uh project. Every for example if we make a lot of uh film uh for CINETIS uh all are on the top I mean uh all the film are y you you don't have a separation for the film for CINETIS or for the project and s Well you say video name equal, all are on the same level. Okay. Okay. Okay. Okay. Yeah, okay. Ah. okay. To make Okay. If we need to make some special thing on the browser, it's t Okay. Yeah, okay, it's okay. Okay, great. Okay. Mm. Okay. Work in progress. Yeah. Input. No. Yeah uh. Mm. Yeah, mainly the t webpage or for output. Yeah. Okay. And all these uh file are only uh internal. This mean uh if I go t on uh C CINTETIS CINETIS uh From outside of IDIAP we can access this. With a password or Mm. Yeah. Yeah. Okay, yes, okay. Okay. Yeah. Yeah, okay. Yeah, okay. Yeah, yes, yeah, okay. Okay. Hmm. Yeah, okay. Uh. Yeah, yeah. Yeah, yeah, I understand, yeah, yeah. Okay. Yeah. Yeah, okay. Right, okay. Yeah. Okay. Hmm. Okay. Yeah. Yes. Hmm. Okay, yeah. Hmm. Yes. Yeah, okay. And the code is based on open C_V_. Oh okay. Okay. Okay. Bhattacharya okay, and so on, okay. Okay. Okay, okay. So we can forget tosh and uh use uh input se the input reader for the the M_ PEG we can re lo use uh open C_V_. Okay. Okay. Hmm. It's good. Yeah. Yeah. Yes. Output, hmm. Uh yeah. Okay. Okay. Okay. Yeah. Yeah, yeah, okay. Ah, okay. Okay. We can skip some frames. Okay. Yeah. Okay. Yeah, yeah, yeah. Mm-hmm. Okay. Okay, it's perfect. Hmm. Yes. Yes. Yes. Yeah. Mm. Yeah, the the browser. Hmm. Hmm. Yes. Yes. Uh so just uh last question maybe, uh so this makes a X_M_L_ file. Okay. This is the the programme to to generating the X_M_L_ file and the the shot. And uh with the kayak you have one problems, you don't have the Real Media, so this is another thing. Okay. Okay. Okay. So you have some special tools from uh from Real Media. Okay. Hmm. Mm. Okay. Hmm. Hmm, okay, okay. Okay, I heard it. Okay. But uh anyway you need you need to have Windows to make these. So I will ask you or someone else, okay. Okay, okay, understand. Okay. But this is this is made by a programme done outside IDIAP. Okay. This is yes, right, okay. This not uh IDIAP programme. Okay, okay. Hmm. Yeah. Yeah, okay. Yeah. Mm. Um. Hmm. Hmm. Hmm. Hmm. Okay. Mm. Okay. Thank you. Okay. Yes. I think for test I can take the real demo and make the the full processing. So I will Yeah. Yeah. Yeah. Okay. Hmm. Okay. Mm okay. I have some informations. Okay. And uh in this in this the begin and end the number is the for example the end is zero dot twelve twelve is uh this frame too, yeah. Okay. Okay. Okay. Okay. Okay. What what Okay. Uh. Okay. Mm. Just what what Hmm. Yeah, okay. Okay. It's okay.

